Evaluating the Jaccard-Tanimoto Index on Multi-core Architectures
نویسندگان
چکیده
The Jaccard/Tanimoto coefficient is an important workload, used in a large variety of problems including drug design fingerprinting, clustering analysis, similarity web searching and image segmentation. This paper evaluates the Jaccard coefficient on the the Cell/B.E.processor and the Intel R ©Xeon R ©dual-core platform. In our work, we have developed a novel parallel algorithm specially suited for the Cell/B.E. architecture for all-to-all Jaccard comparisons, that minimizes DMA transfers and reuses data in the local store. We show that our implementation on Cell/B.E. outperforms the implementations on comparable Intel platforms by 6-20X with full accuracy, and from 10-50X in reduced accuracy mode, depending on the size of the data. In addition to performance, we also discuss in detail our efforts to optimize our workload on both the Cell/B.E. and the Intel architectures and explain how avenues for optimization on each architecture are very different and vary from one architecture to another for our workload. Our work shows that the algorithms or kernels employed for the Jaccard coefficient calculation are heavily dependent on the traits of the target hardware.
منابع مشابه
Correcting Jaccard and other similarity indices for chance agreement in cluster analysis
Correcting a similarity index for chance agreement requires computing its expectation under fixed marginal totals of a matching counts matrix. For some indices, such as Jaccard, Rogers and Tanimoto, Sokal and Sneath, and Gower and Legendre the expectations cannot be easily found. We show how such similarity indices can be expressed as functions of other indices and expectations found by approxi...
متن کاملFully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs
The success of deep convolutional neural networks on image classification and recognition tasks has led to new applications in very diversified contexts, including the field of medical imaging. In this paper we investigate and propose neural network architectures within the context of automated segmentation of anatomical organs in chest radiographs, namely for lungs, clavicles and heart. The pr...
متن کاملTanimoto's Best Barbecue: Discovering Regulatory Modules using Tanimoto Scores
We present a combinatorial method for discovering cis-regulatory modules in promoter sequences. Our approach combines “sliding window” approaches with a scoring function based on the so-called Tanimoto score. This allows to identify sets of binding sites that tend to occur preferentially in the vicinity of each other in a given set of promoter sequences belonging to co-expressed or orthologous ...
متن کاملComparison of similarity coefficients used for cluster analysis with dominant markers in maize (Zea mays L)
The objective of this study was to evaluate whether different similarity coefficients used with dominant markers can influence the results of cluster analysis, using eighteen inbred lines of maize from two different populations, BR-105 and BR-106. These were analyzed by AFLP and RAPD markers and eight similarity coefficients were calculated: Jaccard, Sorensen-Dice, Anderberg, Ochiai, Simple-mat...
متن کاملImplicitly Defined Substructure Fingerprints for Support Vector Machines
For the calculation of the Tanimoto similarity of two molecules, only the patterns that occur in at least one of them are needed. These can be obtained on-the-fly by a generation method. : The substructure set is generated for each of the molecules, and each of the substructures is checked, if it is also contained in the other set. For the Tanimoto Coefficient it is sufficient to know the cardi...
متن کامل